Query Optimization in Context of Pseudo Relevant Documents

نویسندگان

  • Ashish Kishor Bindal
  • Sudip Sanyal
چکیده

In conventional vector space model for information retrieval, query vector generation is imperfect for retrieval of precise documents which are desired by user. In this paper, we present a stochastic based approach for optimizing query vector without user involvement. We explore the document search space using particle swarm optimization and exploit the search space of possible relevant and non-relevant documents for adaption of query vector. Proposed method improves the retrieval accuracy by optimizing the query vector which is generated in conventional vector space model based on various term weighting strategies including TF-IDF and document length normalization. Our experimental result on two collections Medline and Cranfield shows that adapted query vector in pseudo relevant document performs better over the classical vector space model. We achieved improvement of 3-4% in Mean Average Precision (MAP) and 5-10% in Precision at lower recall. Further expansion of search space in pseudo non-relevant documents does not lead to significant improvement, but proper representation of pseudo non-relevant document leaves a scope in future to guide the better optimization of query vector.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning and optimization of an aspect hidden Markov model for query language model generation

The Relevance Model (RM) incorporates pseudo relevance feedback to derive query language model and has shown a good performance. Generally, it is based on uni-gram models of individual feedback documents from which query terms are sampled independently. In this paper, we present a new method to build the query model with latent state machine (LSM) which captures the inherent term dependencies w...

متن کامل

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

Learning to Weight Translations using Ordinal Linear Regression and Query-generated Training Data for Ad-hoc Retrieval with Long Queries

Ordinal regression which is known with learning to rank has long been used in information retrieval (IR). Learning to rank algorithms, have been tailored in document ranking, information filtering, and building large aligned corpora successfully. In this paper, we propose to use this algorithm for query modeling in cross-language environments. To this end, first we build a query-generated train...

متن کامل

Recurrent Pseudo Relevance Feedback on Web Collections

Various Relevance Feedback techniques exist in Information Retrieval such as Simulated Relevance Feedback and Pseudo Relevance Feedback. In a Simulated Relevance Feedback technique a new query is reformulated based on the documents selected by the user from the top-ranked documents whereas in a Pseudo Relevance Feedback, the query is reformulated based on the assumption that N top-ranked docume...

متن کامل

Improved Query Topic Models via Pseudo-Relevant Pólya Document Models

Query-expansion via pseudo-relevance feedback is a popular method of overcoming the problem of vocabulary mismatch and of increasing average retrieval effectiveness. In this paper, we develop a new method that estimates a query topic model from a set of pseudo-relevant documents using a new language modelling framework. We assume that documents are generated via a mixture of multivariate Pólya ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012